22 research outputs found

    On Reward Structures of Markov Decision Processes

    Full text link
    A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications, rewards are central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning. Specifically, we study the sample complexity of policy evaluation and develop a novel estimator with an instance-specific error bound of O~(Ο„sn)\tilde{O}(\sqrt{\frac{\tau_s}{n}}) for estimating a single state value. Under the online regret minimization setting, we refine the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provide a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, we model hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, we modify a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, we develop a planning algorithm that computationally efficiently finds Pareto-optimal stochastic policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and arXiv:2002.06299; minor edit

    Loop Estimator for Discounted Values in Markov Reward Processes

    Full text link
    At the working heart of policy iteration algorithms commonly used and studied in the discounted setting of reinforcement learning, the policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy in a Markov decision process. We propose a simple and efficient estimator called loop estimator that exploits the regenerative structure of Markov reward processes without explicitly estimating a full model. Our method enjoys a space complexity of O(1)O(1) when estimating the value of a single positive recurrent state ss unlike TD with O(S)O(S) or model-based methods with O(S2)O\left(S^2\right). Moreover, the regenerative structure enables us to show, without relying on the generative model approach, that the estimator has an instance-dependent convergence rate of O~(Ο„s/T)\widetilde{O}\left(\sqrt{\tau_s/T}\right) over steps TT on a single sample path, where Ο„s\tau_s is the maximal expected hitting time to state ss. In preliminary numerical experiments, the loop estimator outperforms model-free methods, such as TD(k), and is competitive with the model-based estimator.Comment: accepted to AAAI 202

    Network analysis identifies a putative role for the PPAR and type 1 interferon pathways in glucocorticoid actions in asthmatics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Asthma is a chronic inflammatory airway disease influenced by genetic and environmental factors that affects ~300 million people worldwide, leading to ~250,000 deaths annually. Glucocorticoids (GCs) are well-known therapeutics that are used extensively to suppress airway inflammation in asthmatics. The airway epithelium plays an important role in the initiation and modulation of the inflammatory response. While the role of GCs in disease management is well understood, few studies have examined the holistic effects on the airway epithelium.</p> <p>Methods</p> <p>Gene expression data were used to generate a co-transcriptional network, which was interrogated to identify modules of functionally related genes. In parallel, expression data were mapped to the human protein-protein interaction (PPI) network in order to identify modules with differentially expressed genes. A common pathways approach was applied to highlight genes and pathways functionally relevant and significantly altered following GC treatment.</p> <p>Results</p> <p>Co-transcriptional network analysis identified pathways involved in inflammatory processes in the epithelium of asthmatics, including the Toll-like receptor (TLR) and PPAR signaling pathways. Analysis of the PPI network identified <it>RXRA</it>, <it>PPARGC1A</it>, <it>STAT1</it> and <it>IRF9</it>, among others genes, as differentially expressed. Common pathways analysis highlighted TLR and PPAR signaling pathways, providing a link between general inflammatory processes and the actions of GCs. Promoter analysis identified genes regulated by the glucocorticoid receptor (GCR) and PPAR pathways as well as highlighted the interferon pathway as a target of GCs.</p> <p>Conclusions</p> <p>Network analyses identified known genes and pathways associated with inflammatory processes in the airway epithelium of asthmatics. This workflow illustrated a hypothesis generating experimental design that integrated multiple analysis methods to produce a weight-of-evidence based approach upon which future focused studies can be designed. In this case, results suggested a mechanism whereby GCs repress TLR-mediated interferon production via upregulation of the PPAR signaling pathway. These results highlight the role of interferons in asthma and their potential as targets of future therapeutic efforts.</p
    corecore